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Flexistipes sinusarabici Fiala ef al. 2000 is the type species of the genus Flexistipes in the fami- 
ly Deferribacteraceae. The species is of interest because of its isolated phylogenetic location 
in a genomically under-characterized region of the tree of life, and because of its origin from 
a multiply extreme environment; the Atlantis Deep brines of the Red Sea, where it had to 
struggle with high temperatures, high salinity, and a high concentrations of heavy metals. 
This is the fourth completed genome sequence to be published of a type strain of the family 
Deferribacteraceae. The 2,526,590 bp long genome with its 2,346 protein-coding and 53 
RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. 



Introduction 



Strain MASIQt [= DSM 4947 = ATCC 49648) is the 
type strain of Flexistipes sinusarabici [1,2] which is 
the type and only species of the genus Flexistipes 
[1,2]. The strain was first isolated from the Atlan- 
tis II Deep brines of the Red Sea [1], together with 
four related isolates. The generic name derives 
from the Latin words flexus, a bending, turning, 
winding, and stipes, a branch of tree, stick [1]. The 
species epithet is derived from the Latin words 
sinus, a curve or fold in land, a gulf, and arabicus, 
Arabic, describing the place of isolation [1]. Since 



the time of its isolation in the late 1980s until now 
no closely related bacterium [16S rRNA identity 
>90%) was described. The resistance of the strain 
to moderate heat, high salt concentrations, and 
heavy metals [1] should make it an interesting 
target for extremophile biotechnology. Here we 
present a summary classification and a set of fea- 
tures for F. sinusarabici MASIO^, together with the 
description of the complete genomic sequencing 
and annotation. 
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Figure 1. Phylogenetic tree highlighting the position of F. sinusarabici relative to the type strains of the other 
species within the phylum "Deferribacteres". The tree was inferred from 1,459 aligned characters [7,8] of the 
16S rRNA gene sequence under the maximum likelihood (ML) criterion [9]. Rooting was done initially using 
the midpoint method [10] and then checked for its agreement with the current classification (Table 1). The 
branches are scaled in terms of the expected number of substitutions per site. Numbers adjacent to the 
branches are support values from 250 ML bootstrap replicates [11] (left) and from 1,000 maximum parsimony 
bootstrap replicates [12] (right) if larger than 60%. Lineages with type strain genome sequencing projects regis- 
tered in GOLD [13] are labeled with one asterisk, those also listed as 'Complete and Published' with two aste- 
risks [14-16]. 
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Classification and features 

A representative genomic 16S rRNA sequence of 
strain MASIO^ was compared using NCBI BLAST 
[3,4] under default settings (e.g., considering only 
the high-scoring segment pairs [HSPs) from the 
best 250 hits) with the most recent release of the 
Greengenes database [5] and the relative frequen- 
cies of taxa and keywords (reduced to their stem 
[6]] were determined, weighted by BLAST scores. 
The most frequently occurring genera were Acidi- 
thiobacillus (60.0%), Deferribacter (26.8%), Flexis- 
tipes (8.2%), Desulfuromonas (2.2%) and Calditer- 
rivibho (1.8%) (80 hits in total). Regarding the sin- 
gle hit to sequences from members of the species, 
the average identity within HSPs was 98.0%, whe- 
reas the average coverage by HSPs was 96.9%. 
Among all other species, the one yielding the high- 
est score was Deferribacter abyssi (AJ515881), 
which corresponded to an identity of 89.7% and an 
HSP coverage of 86.4%. (Note that the Greengenes 
database uses the INSDC (= EMBL/NCBI/DDBJ) 
annotation, which is not an authoritative source for 
nomenclature or classification.) The highest- 
scoring environmental sequence was FR744611 
('succession potential reducers nitrate-treated fa- 
cility determined temperature and nitrate availabil- 
ity production water Halfdan oil field clone 
PWB039'), which showed an identity of 96.7% and 
an HSP coverage of 93.1%. The most frequently 
occurring keywords within the labels of all envi- 
ronmental samples which yielded hits were 
'microbi' (3.9%), 'acid' (3.4%), 'sediment' (3.3%), 
'water' (3.0%) and 'oil' (2.4%) (170 hits in total). 
The most frequently occurring keyword within the 
labels of those environmental samples which 
yielded hits of a higher score than the highest scor- 
ing species was 'avail, determin, facil, field, halfdan, 
nitrat, nitrate-tr, oil, potenti, product, reduc, suc- 
cess, temperatur, water' (7.1%) (1 hit in total). 
While these kejwords fit to the marine environ- 
ment from which strain MASIOt originated, they 
also point to sediments and oil fields which were so 
far not considered as habitats for F. sinusarabici. 

Figure 1 shows the phylogenetic neighborhood of F. 

sinusarabici MASIO^ in a 16S rRNA based tree. The 
sequences of the two identical 16S rRNA gene cop- 
ies in the genome differ by two nucleotides from 
the previously published 16S rRNA sequence 
M59231, which contains 25 ambiguous base calls. 

Cells of strain MASIO^ are straight to bent rods, 
about 0.3 \an wide and 4-50 \im long (Figure 2) [1]. 
F. sinusarabici was described as non-motile [1]. 



Spore-formation was not observed [1]. MASIO^ 
cells stain Gram-negative, and growth is strictly 
anaerobic, with the best growth occurring within a 
temperature range of 45-50°C and a minimum 
doubling time of 8 % hours [1]. Optimal pH range 
for the strain is pH 6-8 [1]. Strain MASIO^ requires 
at least 3% NaCl for growth, but also grows at salt 
concentrations as high as 10% [1]. The organism 
prefers complex growth substrates such as yeast 
extract, meat extract, peptone and tryptone, while 
formate, lactate, citrate, malate, carbohydrate, ami- 
no acids and alcohols do not support cell growth 
[1]. Strain MASIO^ shows an unusual resistance 
against the transcription inhibitor rifampicin [1], 
which is however also commonly found among the 
spirochetes. 

Chemotaxonomy 

The chemotaxonomic data for MASIO^ is relatively 
sparse: No information on cell wall structure, qui- 
nones or polar lipids is available. The fatty acid 
composition is dominated by saturated unbranched 
acids: Cis (23.3%), Cu (15.1%), C17 (12.6%), with 
some branched acids iso-Cu (10.2%), anteiso-Cis 
(10.2%), /S0-C16 (4.1%), iso-Cis (3.6%), and few un- 
saturated acids Ci8:i (9.9%), Cie-.i (2.8%), Cn-.i 
(2.5%) [1]. 

Genome sequencing and annotation 

Genome project history 

This organism was selected for sequencing on the 
basis of its phylogenetic position [28], and is part 
of the Genomic Encyclopedia of Bacteria and Arc- 
haea project [29]. The genome project is depo- 
sited in the Genome On Line Database [13] and the 
complete genome sequence is deposited in Gen- 
Bank. Sequencing, finishing and annotation were 
performed by the DOE Joint Genome Institute 
QGI). A summary of the project information is 
shown in Table 2. 

Growth conditions and DNA isolation 

F. sinusarabici MASIOt, DSM 4947, was grown 
anaerobically in DSMZ medium 524 {Flexistipes 
Medium) [30] at 47°C. DNA was isolated from 0.5-1 
g of cell paste using Jetflex Genomic DNA Purifica- 
tion Kit (GENOMED 600100) following the stan- 
dard protocol as recommended by the manufactur- 
er, but adding 10^1 proteinase K for one hour ex- 
tended lysis at 58°C. DNA is available through the 
DNA Bank Network [31]. 
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Table 1. Classification and general features of F. sinusarabici MASIO^ according to the MIGS recommendations [17] 
and the NamesforLife database [18]. 



MIGS ID Property 



Term 



Evidence code 



MIGS-22 



MIGS-6 

MIGS-15 

MIGS-14 



MIGS-4 

MIGS-5 

MIGS-4.1 

MIGS-4.2 

MIGS-4.3 

MIGS-4.4 



Current classification 



Gram stain 

Cell shape 

Motility 

Sporulation 

Temperature range 

Optimum temperature 

Salinity 

Oxygen requirement 
Carbon source 

Energy metabolism 
Habitat 

Biotic relationship 
Pathogenicity 
Biosafety level 
Isolation 

Geographic location 

Sample collection time 

Latitude 

Longitude 

Depth 

Altitude 



Domain Bacteria TAS 

Phylum " Deferribacteres" TAS 

Class "Deferribacteres" TAS 

Order Deferribacterales TAS 

Family Deferribacteraceae TAS 

Genus Flexistipes TAS 

Species Flexistipes sinusarabici TAS 

Type strain MAS 10 TAS 

negative TAS 

straight to acutely bent rods TAS 

non-motile TAS 

none TAS 

30-53°C, moderately thermophilic TAS 

45-50°C TAS 

at least 3% NaCI, growths with up to 1 8% NaCI TAS 

strictly anaerobic TAS 

complex organic components like yeast extract, meat -^^^ 
extract, peptone, tryptone 

heterotrophic TAS 

marine, deep brine water TAS 

free-living TAS 

none TAS 

1 TAS 
interface between upper brine layer and deep sea water TAS 

Atlantis II Deep brines. Red Sea TAS 

1 987 or before NAS 

21.37 TAS 

38.07 TAS 

2,000 -2,200 m TAS 

-2,200-2,200 m TAS 



19] 

20,21] 

22,23] 

22,24] 

22,25] 

1,2] 

1,2] 

1] 

1] 

1] 

1] 

1] 

1] 

1] 

1] 

1] 

1] 

1] 

1] 

1] 

1] 

26] 

1] 

1] 

1] 
1] 
1] 
1] 



Evidence codes - TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable 
Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted prop- 
erty for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [27]. 
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Table 2. Genome sequencing project information 



MIGS ID 


Property 


Term 


MlGS-31 


Finishing quality 


Finished 


MIGS-28 


Libraries used 


rwU 1 iid ivJllllL. llJJldl 1 CD. Ul Itr ■T" J'T" py 1 tJjClJUt;! IL-fc; sLcllIUclIU llUldly/ LWU 

PE libraries (3 kb, 1 5.5 kb insert size), one lllumina library 


MIGS-29 


Sequencing platforms 


lllumina GAii, 454 GS FLX Titanium 


MIGS-31.2 


Sequencing coverage 


162.0 X lllumina; 37.9 x pyrosequence 


MlGS-30 


Assemblers 


New^bler version 2.3, Velvet version 0.7.63, phrap SPS-4.24 


MlGS-32 


Gene calling method 


Prodigal 1 .4, GenePRlMP 




INSDC ID 


CP002858 




Genbank Date of Release 


June 17, 2011 




GOLD ID 


Gc01819 




NCBI project ID 


45817 




Database: IMG-GEBA 


2505679008 


MlGS-13 


Source material identifier 


DSM 4947 




Project relevance 


Tree of Life, GEBA 



Genome sequencing and assembly 

The genome was sequenced using a combination 
of lllumina and 454 sequencing platforms. All 
general aspects of library construction and se- 
quencing can be found at the JGI website [32]. Py- 
rosequencing reads were assembled using the 
Newbler assembler (Roche]. The initial Newbler 
assembly consisting of 175 contigs in two scaf- 
folds was converted into a phrap [33] assembly by 
making fake reads from the consensus, to collect 
the read pairs in the 454 paired end library. lllu- 
mina GAii sequencing data (489.7 Mb) was as- 
sembled with Velvet [34] and the consensus se- 
quences were shredded into 2.0 kb overlapped 
fake reads and assembled together with the 454 
data. The 454 draft assembly was based on 170.4 
Mb 454 draft data and all of the 454 paired end 
data. Newbler parameters are -consed -a 50 -1 350 
-g -m -ml 20. The Phred/Phrap/Consed software 
package [33] was used for sequence assembly and 
quality assessment in the subsequent finishing 
process. After the shotgun stage, reads were as- 
sembled with parallel phrap (High Performance 
Software, LLC). Possible mis-assembhes were cor- 
rected with gapResolution [32], Dupfinisher [35], 
or sequencing cloned bridging PGR fragments with 
subcloning. Gaps between contigs were closed by 
editing in Consed, by PGR and by Bubble PGR pri- 
mer walks (J.-F. Ghang, unpublished). A total of 
605 additional reactions and 15 shatter libraries 
were necessary to close gaps and to raise the qual- 
ity of the finished sequence. lllumina reads were 
also used to correct potential base errors and in- 
crease consensus quality using a software Polisher 



developed at JGI [36]. The error rate of the com- 
pleted genome sequence is less than 1 in 100,000. 
Together, the combination of the lllumina and 454 
sequencing platforms provided 199.9 x coverage 
of the genome. The final assembly contained 
248,918 pyrosequence and 395,536,860 lllumina 
reads. 

Genome annotation 

Genes were identified using Prodigal [37] as part 
of the Oak Ridge National Laboratory genome an- 
notation pipeline, followed by a round of manual 
curation using the JGI GenePRlMP pipehne [38]. 
The predicted GDSs were translated and used to 
search the National Center for Biotechnology In- 
formation (NCBI) non-redundant database, Uni- 
Prot, TIGR-Fam, Pfam, PRIAM, KEGG, COG, and In- 
terPro databases. Additional gene prediction anal- 
ysis and functional annotation was performed 
within the Integrated Microbial Genomes - Expert 
Review (IMG-ER) platform [39]. 

Genome properties 

The genome consists of a 2,526,590 bp long circu- 
lar chromosome with a G+C content of 38.3% (Ta- 
ble 3 and Figure 3). Of the 2,399 genes predicted, 
2,346 were protein-coding genes, and 53 RNAs; 85 
pseudogenes were also identified. The majority of 
the protein-coding genes (75.2%) were assigned a 
putative function while the remaining ones were 
annotated as hypothetical proteins. The distribu- 
tion of genes into COGs functional categories is 
presented in Table 4. 
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Table 3. Genome Statistics 



Attribute 


Value 


% of Total 


Genome size (bp) 


2,526,590 


100.00% 


DNA coding region (bp) 


2,1 79,830 


86.28% 


DNA G+C content (bp) 


967,539 


38.29% 


Number or replicons 


1 




Extrachromosomal Elements 


U 




Total genes 


2,399 


100.00% 


RNA genes 


53 


2 .2 1 % 


rRNA operons 


2 




Protein-coding genes 


2,346 


97.79% 


Pseudo genes 


85 


3.54% 


Genes with function prediction 


1,803 


75.1 6% 


Genes in paralog clusters 


242 


10.09% 


Genes assigned to COGs 


1,924 


80.20% 


Genes assigned Pfam domains 


1,978 


82.45% 


Genes with signal peptides 


366 


15.26% 


Genes with transmembrane helices 


579 


24.14% 


CRISPR repeats 


0 





Insight into the genome sequence 
Comparative genomics 

Lacking an available genome sequence of Deferh- 
bacter abyssi, the species yielding the highest 
score, the following comparative analyses were 
done with D. desulfuhcans [14] [GenBank 
AP011529, AP011530) and Calditerrivibho nitro- 
reducens (GenBank CP002347, CP002348) [16], 
the phylogenetically closest organisms for which a 
genome sequence was available. The genomes of 
F. sinusarabici, D. desulfuhcans and C. nitroredu- 
cens are similar in sizes (2.5 Mb, 2.5 Mb and 2.2 
Mb, respectively) and have a similar, quite low 
G+C content (38%, 30% and 36%, respectively). 
Whereas F. sinusarabici has no plasmid, D. desulfu- 
hcans harbors a 5.9 kb plasmid; C. nitroreducens 
contains a 30.8 kb megaplasmid. 

An estimate of the overall similarity between the 
three genomes was generated with the GGDC- 
Genome-to-Genome Distance Calculator [40,41]. 
This system calculates the distances by comparing 
the genomes to obtain HSPs (high-scoring seg- 
ment pairs) and inferring distances from a set of 
formulas (1, HSP length / total length; 2, identities 



/ HSP length; 3, identities / total length). Table 5 
shows the results of the pairwise comparison be- 
tween the three genomes. 

The comparison of the F. sinusarabici and D. desul- 
fuhcans genomes revealed that 5.9% of the aver- 
age of both genome lengths are covered with 
HSPs. The identity within these HSPs was 83.2%, 
whereas the identity over the whole genome was 
only 4.9%. Similar results were inferred for F. si- 
nusarabici and C. nitroreducens (Table 5). The ge- 
nomes of D. desulfuhcans and C. nitroreducens 
show a significantly higher degree of similarity 
with 9.9% of the average of both genomes are 
covered with HSPs of 83.3% identity. The identity 
over the whole length of the genomes was 8.3%. 
These values corroborate the relationship be- 
tween the three organisms as shown in the 16S 
rRNA-based phylogenetic tree in Figure 1, as there 
is no bootstrap support that F. sinusarabici is clos- 
er related to either C. nitroreducens or D. desulfuh- 
cans. 
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Figure 3. Graphical circular map of the chromosome. From outside to the center: Genes on forward strand (color 
by COG categories), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, rRNAs red, 
other RNAs black), GC content, GC skew. 



The fraction of shared genes in the three genomes 
is shown in a Venn diagram [Figure 4). The num- 
bers of pairwise shared genes were calculated 
with the phylogenetic profiler function of the 
IMG/ER platform [33]. The homologous genes 
within the genomes were detected with a maxi- 
mum E-value of lO-s and a minimum identity of 
30%. Roughly 61% of all genes in the genomes 
[1,400 genes) are shared by all three genomes, 
with about equal numbers of genes [224 and 246) 



shared on a pairwise basis by F. sinusarabici and 
D. desulfuricans or by D. desulfuricans and C. nitro- 
reducens, respectively, and to the exclusion of the 
third genome. Within the 567 unique genes of F. 
sinusarabici that have no detectable homologs in 
the genomes of D. desulfuricans and C. nitroredu- 
cens [under the sequence similarity thresholds 
used for the comparison) the 86 genes [3.7% 
based on the whole gene number) encoding 
transposases appear to be noteworthy. 
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Table 4. Number of genes associated with the general COG functional categories 
Code value %age Description 



1 


1 ^3 


7 n 


1 1 dliSldLlUri/ 1 llJUDUrildl sir UCLUrtr dllU UlUgtrlitrDlD 


A 


1 
1 


n 1 
u. 1 


KiNA processing anu riiouiiicaiion 


K. 




4.U 


Transcription 


1 

L 


zUj 


y.o 


Replication, recombination and repair 


D 
D 


n 
Z 


U. 1 


Chromatin structure and dynamics 


n 


9 1 
z 1 


1 n 


^^Cll L-VL-lfc: CUIILIUI, L-Cll UlVlblwII/ L-l 1 1 W[ 1 IWbWl 1 It; Udl LI LlWl 1 1 1 1 1! 


V 

Y 


n 
U 


U.U 


Nuclear structure 


V 


ZD 


1 . j 


Defense mechanisms 


T 
1 


1 1 J 


r r 
J.J 


Signal transduction mechanisms 


/VI 


1 'J 
1 JJ 


b.j 


Cell wall/membrane/envelope biogenesis 


IN 


JO 


1 7 


Cell motility 


■7 


U 


U.U 


Cytoskeleton 


\A/ 
Vv 


n 


n n 

U.U 


LaLi dLcl I U Idi dLi ULlUicd 


1 1 
U 


bz 


j.U 


Intrscelluldr trafficking/ secretion^ and vesicular transport 




0 1 


Q 


Posttranslational modification, protein turnover, chaperones 


r 


1 74 


O.J 


1 1 Ici tdy Ul UU UL.L1UI 1 dl 1 U L.UI 1 Vtri b lUI 1 


r; 
V.I 


ftp. 

DD 


9 

J .z 


'^^di uoiiyui die LidiisuuiL diiu iiieiduuiisin 


t 




y.j 


Amino acid transport and metabolism 


c 
r 


JZ 


9 1^ 

Z.J 


iNUCieoLiue Liaiispoii aiiu iiieiauoiisin 


n 


1 1 J 


r r 
J.J 


Coenzyme transport and metabolism 


1 


bU 


z.y 


Lipid transport and metabolism 


P 


86 


4.1 


Inorganic ion transport and metabolism 


Q 


32 


1.5 


Secondary metabolites biosynthesis, transport and catabolism 


R 


246 


11.8 


General function prediction only 


S 


145 


7.0 


Function unknown 




475 


19.8 


Not in COGs 



Table 5. Pairwise comparison of F. sinusarabici, D. desulfuricans and C. nitroreducens using the GGDC-Calculato r. 

1, HSP length 2, identities /HSP 3, identities 
/total length [%] length [%] /total length [%] 

F. sinusarabici D. desulfuricans 5.9 83.2 4.9 

F. sinusarabici C. nitroreducens 5.1 83.3 4.3 

D. desulfuricans C. nitroreducens 9.9 83.3 8.3 
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A remarkable difference between the compared 
organisms is their motihty. Whereas F. sinusarabi- 
ci is described to be non-motile, D. desulfuricans is 
motile by twitching [14] and C. nitroreducens is 
also described to be motile [16]. The mechanism 
of twitching motility is still unknown but it is 
thought that moving across surfaces is caused by 
extension and retraction of type IV pili. A set of 
genes that is responsible for twitching motility 
was identified in several organisms; in Pseudomo- 
nas aeruginosa a gene cluster involved in pilus 
biosynthesis and twitching motility was characte- 
rized, the gene products of this gene cluster show 
a high degree of sequence similarity to the chemo- 
taxis [che) proteins of enterics and the gliding 



bacterium Myxococcus xanthus [42]. A closer look 
into the genome sequences of F. sinusarabici, D. 
desulfuricans and C. nitroreducens revealed the 
presence of different gene sets coding for chemo- 
taxis proteins. In contrast to D. desulfuricans and C. 
nitroreducens, F. sinusarabici lacks four che genes 
[cheB, cheR, cheV, cheW). In P. aeruginosa a muta- 
tion in the pil\ gene, a homolog to cheW, lead to a 
blocking of pilus production [42]. It can be as- 
sumed that the missing cheW gene in F. sinusara- 
bici might be responsible for the non-motility of 
the cells, despite the rather large number of 36 
genes annotated in the cell motility category of 
table 4. 



D. desulfuricans 
(2,374) 



246 



494 




F. sinusarabici 
(2,346) 



567 



1,400 




C. nitroreducens 
(2,127) 

Figure 4. Venn diagram depicting the intersections of protein sets (total number of derived protein 
sequences in parentheses) of F. sinusarabici, D. desulfuricans and C. nitroreducens. 
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